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Abstract —In this paper, we review the parallel and dis¬ 
tributed optimization algorithms based on the alternating di¬ 
rection method of multipliers (ADMM) for solving “big data” 
optimization problems in modern communication networks. We 
first introduce the canonical formulation of the large-scale opti¬ 
mization problem. Next, we describe the general form of ADMM 
and then focus on several direct extensions and sophisticated 
modifications of ADMM from 2-block to Y-block settings to 
deal with the optimization problem. The iterative schemes and 
convergence properties of each extension/modification are given, 
and the implementation on large-scale computing facilities is also 
illustrated. Finally, we numerate several applications in commu¬ 
nication networks, such as the security constrained optimal power 
flow problem in smart grid networks and mobile data offloading 
problem in software defined networks (SDNs). 

I. Introduction 

Nowadays, modem communication networks play an impor¬ 
tant role in electric power system, mobile cloud computing, 
smart city evolution and personal health care. The employed 
novel telecommunication technologies make data collection 
much easier for power system operation and control, en¬ 
able more efficient data transmission for mobile applications, 
and promise a more intelligent sensing and monitoring for 
metropolitan city-regions. Meanwhile, we are witnessing an 
unprecedented rise in volume, variety and velocity of infor¬ 
mation in modern communication networks. A large volume 
of data are generated by our digital equipments such as mobile 
devices and computers, smart meters and household appli¬ 
ances, as well as surveillance cameras and sensor-equipped 
mass rapid transit around the city. The information exposi¬ 
tion of big data in modern communication networks makes 
statistical and computational methods significantly important 
for data analysis, processing, and optimization. The network 
operators or service providers who can develop and exploit 
efficient methods to tackle big data challenges will ensure 
network security and resiliency, gain market share, increase 
revenue with distinctive quality of service, as well as achieve 
intelligent network operation and management. 

The unprecedented “big data”, reinforced by communication 
and information technologies, presents us opportunities and 
challenges. On one hand, the inferential power of algorithms, 
which have been shown to be successful on modest-sized 
data sets, may be amplified by the massive dataset. Those 
data analytic methods for the unprecedented volumes of data 
promises to personalized business model design, intelligent 
social network analysis, smart city development, efficient 


healthcare and medical data management, and the smart grid 
evolution. On the other hand, the sheer volume of data makes 
it unpractical to collect, store and process the dataset in a 
centralized fashion. Moreover, the massive datasets are noisy, 
incomplete, heterogeneous, structured, prone to outliers, and 
vulnerable to cyber-attacks. The error rates, which are part 
and parcel of any inferential algorithm, may also be amplified 
by the massive data. Finally, the “big data” problems often 
come with time constraints, where a medium-quality answer 
that is obtained quickly can be more useful than a high- 
quality answer that is obtained slowly. Overall, we are facing 
a problem in which the classic resources of computation such 
as time, space, and energy, are intertwined in complex ways 
with the massive data resources. 

With the era of “big data” comes the need of parallel 
and distributed algorithms for the large-scale inference and 
optimization. Numerous problems in statistical and machine 
learning, compressed sensing, social network analysis, and 
computational biology formulates optimization problems with 
millions or billions of variables. Since classical optimization 
algorithms are not designed to scale to problems of this 
size, novel optimization algorithms are emerging to deal with 
problems in the “big data” setting. An incomprehensive list 
of such kind of algorithms includes block coordinate descent 
method m-Efl stochastic gradient descent method 0-EI, 
dual coordinate ascent method q, m, alternating direction 
method of multipliers (ADMM) (9), iflOl and Frank-Wolf 
method (also known as the conditional gradient method) EE 
03 . Each type of the algorithm on the list has its own strength 
and weakness. The list is sill growing and due to our limited 
knowledge and the fast develop nature of this active field of 
research, many efficient algorithms are not mentioned here. 

In this paper, we focus on the application of ADMM for the 
“big data” optimization problem in communication networks 
like smart grids and software defined networks (SDNs). In 
particular, we consider the parallel and distributed optimiza¬ 
tion algorithms based on ADMM for the following convex 
optimization problem with a canonical form as 

min /(x) = /j(xj) + ... + fi(x N ), 

Xi,X 2 ,...,XjV 

s.t. AjXj + ... + AjvXjv = c, 

X ,; e Xi, i = 1,.. -, N, (1) 

1 (3 proposes a stochastic block coordinate descent method. 


where x = (x^, ■ ■ ■, x^-) T , Xi C R"*(i = 1,2,..., AT) are 
closed convex set, A* £ M mxni (* = 1,2,..., iV) are given 
matrices, c £ R m is a given vector, and fi : R Ki —> R 
(* = 1,2,..., iV) are closed convex proper but not necessarily 
smooth functions, where the non-smoothness functions are 
usually employed to enforce structure in the solution. Problem 
(0} can be extended to handle linear inequalities by introducing 
slack variables. Problem 0 finds wide applications in smart 
grid on distributed robust state estimation, network energy 
management and security constrained optimal power flow 
problem, which we will illustrated later. 

Though many algorithms can be applied to deal with 
problem (Q]), we restrict our attention to the class of algorithms 
based on ADMM. The rest of this paper is organized as 
follows. Section UTI introduces the background of the ADMM 
and its two direct extensions for problem 0 to N blocks. 
The limitations of those direct extensions are also addressed. 
Section [ill] gives three approaches based on Variable Split¬ 
ting, ADMM with Gaussian back substitution and proximal 
Jacobian ADMM to the multi-block settings, respectively, 
for problem (|T]) with provable convergence. The applications 
of problem 0 in communication networks are described in 
Section m Specifically, we discuss two examples in detail: 
the security constrained optimal power flow problem in smart 
grid networks and mobile data offloading problem in SDNs. 
Section 0 summarizes this paper. 

II. ADMM Background 

In this section, we first introduce the general form of 
ADMM for optimization problem analogous to 01 with only 
two blocks of functions and variables. After that, we describe 
two direct extensions of ADMM to multi-block setting. 

A. ADMM 

The ADMM was proposed in [|T3l , [|T4l and recently revis¬ 
ited by GDI- The general form of ADMM is expressed as 

min /i(xi) + / 2 (x 2 ) s.t. A x xi + A 2 x 2 = c. (2) 

XiGAfi ,X2 E 

The augmented Lagrangian for (0 is 

£ p (xi,x 2 , A) = /i(xi) + / 2 (x 2 ) - A t (Aixi + A 2 x 2 - c) 

+ |||AiXi + A 2 x 2 - c|| 2 , (3) 

where A £ R m is the Lagrangian multiplier and p > 0 is 
the parameter for the quadratic penalty of the constraints. The 
iterative scheme of ADMM embeds a Gauss-Seidel decompo¬ 
sition into iterations of xi and x 2 as follows 

( x* +1 = argmin>Cp(x 1 ,x|, A fc ), (4) 

| X 2 +1 = argmin£ p (xi +1 ,x 2 , A fc ), (5) 

l A fc+1 = A fc - p( A lX J +1 + A 2 x* +1 - c), (6) 

where in each iteration, the augmented Lagrangian is mini¬ 
mized over xi and x 2 separately. In 0 and 0, functions /i 
and / 2 as well as variables xi and x 2 are treated individually. 


Algorithm 1 Two-block ADMM 
Initialize: x°. A 0 , p > 0; 
for ft = 0, 1 ,... do 

x^ +1 = argmin xi £ p (xi,x£, A fc ); 
x 2 +1 = argmin X2 £ p (x^ +1 , x 2 , A fe ); 

X k+i = X k _ p(AlX fc+i + A 2 x^ + 1 - c); 

end for 




Fig. 1. Comparison of Gauss-Seidel update and Jacobian update. 


so easier subproblems can be generated. This feature is quite 
attractive and advantageous for a broad spectrum of applica¬ 
tions. The convergence of ADMM for convex optimization 
problems with two blocks of variables and functions has been 
proved in (9), iflOl , and the iterative scheme is illustrated in 
AlgorithmQ] Algorithm[T|can deal with multi-block case when 
auxiliary variables are introduced, which will be described in 
Section IIII-AI 

B. Direct Extensions to Multi-block Setting 

The ADMM promises to solve the optimization problem 0 
with the same philosophy as Algorithm [Q In the following, 
we present two kinds of direct extensions, Gauss-Seidel and 
Jacobian for multi-block ADMM. The comparison of these 
two updates is shown in Figure [Q To be specific, we first give 
the augmented Lagrangian function of problem 0 

N N 

£ p (x i,...,xjv,A) =5>(*) - A T (^ AjXi -c) (7) 

2=1 2=1 

+ — c lll- 

»=t 

where a penalty on linear constrains is added to the Lagrangian 
function. The p is the penalty parameter. 

1) Gauss-Seidel: Intuitively, a natural extension of the 
classical Gauss-Seidel setting ADMM from 2 blocks to N 
blocks is a straightforward replacement of the two-block 
alternating minimization scheme by a sweep of update of x, 
for i = 1,2 , ,N sequentially. In particular, at iteration ft, 
the update scheme for x, : is 

x, = argmin£ p ({x* :+1 } i<i ,x i ,{x^} i>i , A fe ), (8) 

Xi 




















Algorithm 2 Gauss-Seidel Multi-block ADMM 
Initialize: x°. A 0 , p > 0; 
for ft = 0,1,... do 
for i = 1,..., IV do 

{x,; is updated sequentially.} 
x-' +1 = arg min x . ^({x^j^^Xj, {x*? }j>i, A fc ); 
end for 

A fc+1 = \ k — p{J2iLi AixH 1 — c); 

end for 


where {xj}j<j denotes the set of variables prior to i. The 
augmented Lagrangian function 0 is split and updated al- 
ternatingly. The direct Gauss-Seidel type extension can be 
illustrated in Algorithm [2] 

Remark: Algorithm [2] has been utilized in practical prob¬ 
lems EQ-H2 despite a lack of rigourous proof for the 
convergence. Actually, the convergence of Gauss-Seidel multi¬ 
block ADMM is not well understood and is ambiguous for a 
long time: Neither affirmative convergence proof nor counter 
examples for convergence failure are shown in the literature. 
Recently, m has shown that the direct extension of Gauss- 
Seidel mulit-block ADMM is not necessarily convergent. ED 
proves the convergence of Algorithm [2] with a sufficient small 
step size for Lagrangian multiplier update and additional 
assumptions on the problem 0. m conjectures that an 
independent uniform random permutation of the update order 
for blocks in each iteration will result in a convergent iteration 
scheme. ED, E2 propose some slightly modified version 
of Algorithm [2] with provable convergence and competitive 
iteration simplicity and computing efficiently, which we will 
illustrate later in Section IIII-BI 

2) Jacobian: Another possible iterative scheme for the N 
blocks ADMM is the Jacobian type update, which performs 
the update of x, in a parallel coordinate fashion for i = 
1 ,N. In particular, the update of x, is calculated as: 

x, = argmin£p(x i ,{x^} J ^ i , X k ), (9) 

where {xy denotes the set of variables except for x,. 
Different from the iterative scheme of Algorithm [2] that the 
update of x, has to be performed sequentially one after an¬ 
other, the iterations in the Jacobian ADMM can be performed 
concurrently, i.e. all x, can be updated in a parallel fashion. 
This advantage makes the Jacobian type ADMM preferred for 
parallel implementation, and the direct Jacobian type extension 
can be illustrated in Algorithm [3] 

Remark Though Algorithm 0 is more computational effi¬ 
cient in the sense of parallelization, l23l shows that Algorithm 
[3 is not necessarily convergent in the general case, even in 
the 2 blocks case. 8241 proves that if matrices A; are mu¬ 
tually near-orthogonal and have full column-rank. Algorithm 
[3 converges globally. A proximal Jacobian ADMM is also 
proposed in l24l with provable convergence, which we will 
illustrate later in Section IIII-CI 


Algorithm 3 Jacobian Multi-block ADMM 

Initialize: x°. A 0 , p > 0; 
for ft = 0,1,... do 
for i = 1,..., N do 

{x,; is updated concurrently.} 
x? +1 = argmin Xi £ p (x i} {x*}^, A fc ); 
end for 

A H1 = A fc -p(E,yA,x‘ +1 -c); 

end for 


III. Multi-block ADMM 

In this section, we introduce several sophisticated mod¬ 
ifications of ADMM, Variable splitting ADMM (9}, iflOl . 
8251 . ADMM with Gaussian Back Substitution {Til . 8261 and 
Proximal Jacobian ADMM 8241 . 8271 . to deal with the multi¬ 
block setting. 

A. Variable Splitting ADMM 

To solve the optimization problem 0, we can apply the 
variable splitting 0, d, d to deal with the multi-block 
variables. In particular, the optimization problem 0 can be 
reformulated by introducing auxiliary variable z 

N 

min VttxO + fcW, 

x,z z ' 

2=1 

s.t. AiXi + Zi = i = l,...,N, (10) 

where z = (zj, ..., z^) T is partitioned conformably accord¬ 
ing to x, and /^(z) is the indicator function of the convex 
set Z, i.e. /g(z) = 0 for z € Z = {z\ J2iLi z * = 0} an d 
Iz(z) = oo otherwise. The augmented Lagrangian function is 

N N 

C P = Y1 /*( X i) + 7 ^( Z ) ( A ‘ X * + Zi ~ 

2=1 2=1 

N 

+ 2 ^ AiXi +Zi ~ (H) 

where we have two groups of variables, {xi,...,xjv} and 
{zi,..., zyv}- Hence, we can apply the two-block ADMM to 
update these two groups of variables iteratively, i.e, we can 
first update group {x,} and then update group {z,}. In each 
group, x, and z, can be updated concurrently in parallel at 
each iteration. In particular, the update rules for x, : and z, are 

*i +1 = argmin x . £ p (x i; zf, A.f), 
z i +1 = argmin z . £ p (x* +1 , z tl A, fc ), Vi = 1,..., N, 
A.H 1 = A?-rtA^+ *-£). 

(12) 

The variable splitting ADMM is illustrated in Algorithm 0] 
The relationship between this splitting scheme and the Jaco¬ 
bian splitting scheme has been outlined in the following work 
827) . Algorithm 0 enjoys the convergence rates of the 2-block 
ADMM. However, the number of variables and constraints will 
increase substantially when N is large, which will impact the 
efficiency and incur significant burden for the computation. 












Algorithm 4 Variable Splitting Multi-block ADMM 

Initialize: x° 

, z°. A 0 , p > 0; 

for k = 0, 1 , 

... do 

for i = 1 , 

. .., N do 

{x t , z, 

and \i are updated concurrently.} 

-N _ 

argmin x . £ p (x,, z{, A}'); 

+1 = 

argmin z . £ p (xJ +1 , z*, A{'); 

A* +1 = 

A? - p(AjXj + z i-jf); 

end for 


end for 



B. ADMM with Gaussian Back Substitution 


Many efforts have been made to improve the convergence 
of the Guass-Seidel type multi-block ADMM ED . ED . In 
this part, we describe the ADMM with Gaussian back substi¬ 
tution ED, which asserts that if a new iterate is generated 
by correcting the output of Algorithm [2] with a Gaussian 
back substitution procedure, then the sequence of iterates 
converges to a solution of problem ((]}• We first define vector 
V = (xj,..., xjf, A T ) T , vector v = (xJ,...,x^,A ) T , 
matrix H = diag(pAjA 2 ,..., pAjjAw, -I TO ) and M as 


( pA 2 A .2 


M = 


paJa 2 


0 

pAj A 3 


pA^A 2 pA^A;? 

V 0 0 


0 \ 


pAjfAu 0 

0 ^1 m / 

(13) 


Each iteration of the ADMM with Gaussian back substitu¬ 
tion consists of two procedures: a prediction procedure and a 
correction procedure. The v is generated by Algorithm [2] In 
particular, x, is updated sequentially as 


Xi = argmin£ p ({x^} J<i ,x ?; ,{x J fe } i>i ,A fc ), 

Xi 


(14) 


where the prediction procedure is performed in a forward man¬ 
ner, i.e. from the first to the last block and to the Lagrangian 
multiplier. Note that the newly generated x,; are used in the 
update of the next block in accordance with the Gauss-Seidel 
update fashion. After the update of the Lagrangian multiplier, 
the correction procedure is performed update v as 

H' 1 M T (v fc+1 - v fc ) = a(v fc - v fe ), (15) 

where H - 1 M T is an upper-triangular block matrix according 
to the definition of H and M. This implies that the update of 
correction procedure is in a backward fashion, i.e., first update 
the Lagrangian multiplier, and then update x, from the last 
block to the first block sequentially. Note that an additional 
assumption that A^ A,; (i = 1,2,..., N ) are nonsingular are 
made here, xi serves as an intermediate variable and is 
unchanged during the correction procedure. The algorithm is 
illustrated in Algorithm 0 


Algorithm 5 ADMM with Gaussian Back Substitution 

Initialize: x°, x°, A 0 , A , p > 0, a £ (0,1); 

for k = 0, 1 ,... do 


for i = 1,..., N do 

{x,; is updated sequentially.} 


Xi = arg min*. £ P ({x,- }j<*, x*, {xj}.,>*, A ); 

end for 

ii fc+1 _ \ k „ts^ N a -s-fc+t 


= A*-p(£tiW i -c); 


{Gaussian back substitution correction step} 

H- 1 M T (v fe+1 - v fc ) = a(v fe - v fc ); 

fc+l _ ~k. 

X 1 — X 1 , 

end for 


The global convergence of the ADMM with Gaussian back 
substitution is proved in ED . and the convergence rate and 
iteration complexity are addressed in (26|. 

C. Proximal Jacobian ADMM 

The other type of modification on the ADMM for the multi¬ 
block setting is based on the Jacobian iteration scheme E3l . 
m . ED . ED . Since the Guass-Seidel update is performed 
sequentially and is not amenable for parallelization, Jacobian 
type iteration is preferred for distributed and parallel optimiza¬ 
tion. In this subsection, we describe the proximal Jacobian 
ADMM E4l . in which a proximal term E9l is added in 
the update compare with that of Algorithm [3] to improve 
convergence. In particular, the update of x, is 

x}' +1 = argmin£ p (x,, {x^} j¥i , A fc ) + ^ ||xj-x*|||,., (16) 

where ||xi||p = x^PjX, for some symmetric and positive 
semi-definite matrix P, A 0. The involvement of the proximal 
term can make the subproblem of x, strictly or strongly 
convex, and thus make the problem more stable. Moreover, 
multiple choice of P, can make the subproblems easier to 
solve. The update of the Lagrangian multiplier is 

N 

A fc+1 = A fc ^ 7 p(^A i x£ 1 -c), (17) 

i=l 

where 7 > 0 is the damping parameter and the algorithm is 
illustrate in Algorithm [ 6 ] 

The global convergence of the proximal Jacobian ADMM 
which is proved in E4l . Moreover, it enjoys a convergence 
rate of o(l/fc) under conditions on P; and 7 . 

D. Implementations 

The recent development in high performance computing 
(HPC) and cloud computing paradigm provides a flexible and 
efficient solution for deploying the large-scale optimization 
algorithms. In this part, we describe possible implementation 
approaches of those distributed and parallel algorithms on 
current mainstream large scale computing facilities. 

One possible implementation utilizes available comput¬ 
ing incentive techniques and tools like MPI, OpenMP, and 










Algorithm 6 Proximal Jacobian ADMM 
Initialize: x°. A 0 , p > 0, 7 > 0; 
for ft = 0, 1 ,... do 
for i = 1,..., N do 

{x,; is updated concurrently.} 

x-' +1 =argmin x . £ p (x,, {xj | (/ ,. A fc )+|||x i -x*|||,.; 

end for 

A fc + 1 =A fc - 7 p(E^iA i x?+ 1 -c); 

end for 


OpenCL. The MPI is a language-independent protocol used 
for inter-process communications on distributed memory com¬ 
puting platform, and is widely used for high-performance 
parallel computing today. The (multi-block) ADMM using 
MPI has been implemented in ifTOl and l30l . Besides, the 
OpenMP, which is a shared memory multiprocessing parallel 
computing paradigm, and the OpenCL, which is a heteroge¬ 
nous distributed-shared memory parallel computing paradigm 
that incorporate CPUs and GPUs, also promise to implement 
distributed and parallel optimization algorithms on HPC. It is 
expected that supercomputers will reach one exaflops (10 18 
FLOPS) and even zettaflops (10 21 FLOPS) in the near fea¬ 
ture, which will largely enhance the computing capacity and 
significantly expedite program execution. 

Another possible approach exploits the ease-of-use cloud 
computing engine like Hadoop MapReduce and Apache Spark. 
The amount of cloud infrastructure available for Hadoop 
MapReduce makes it convenient to use for large problems, 
though it is awkward to express ADMM in MapReduce since it 
is not designed for iterative tasks. Apache Spark’s in-memory 
computing feature enables it to run iterative optimizations 
much faster than Hadoop, and is now prevalent for large- 
scale machine learning and optimization task on clusters ED. 
This implementation approach is much simpler than previ¬ 
ous computing incentive techniques and tools and promise 
to implementation of the large-scale distributed and parallel 
computation algorithms based on ADMM. The advances in 
the cloud/cluster computing engine provides a simple method 
to implement the large-scale data processing, and recently 
Google, Baidu and Alibaba are also developing and deploying 
massive cluster computing engines to perform the large-scale 
distributed and parallel computation. 

Now we have finished the review of distributed and parallel 
optimization methods based on ADMM, and we summarize 
the relationships among all Algorithms in Figure [2] 

IV. Communication Network Applications 

In this section, we review several applications of distributed 
and parallel optimization in communication networks. In par¬ 
ticular, we describe the security constrained optimal power 
flow problem ll32l . l33l in smart grids and the mobile data 
offloading in SDN lf34l based on ADMM. 



Convergent as 2-block Global convergent. Global convergent with a 

setting convergence rate o(l/k) 


Fig. 2. An illustration of the relationships among Algorithms. 



Fig. 3. Example for the security constrained optimal power flow problem. 

A. Security Constrained Optimal Power Flow 

In this subsection, we consider the distributed and par¬ 
allel approach for security constrained optimal power flow 
problem (SCOPF) |32| . [33]]. The SCOPF is an extension of 
the conventional optimal power flow (OPF) problem, whose 
objective is to determine a generation schedule that minimizes 
the system operating cost while satisfying the system operation 
constraints such as hourly load demand, fuel limitations, 
environmental constraints and network security requirements. 

An illustrative example of SCOPF is shown in Figure Q] 
There are 3 buses with limit 300MW, 2 generators (330MW 
and 120MW) and a load 450MW in the system. In the left 
figure, it is an example for traditional optimal power flow 
problem (OPF) without considering the security constraint. If 
the line between buses 1 and 2 breaks, the line between buses 
1 and 3 cannot afford 330MW (< 300MW), and consequently 
it breaks. Then generator B cannot afford the load, and as a 
result line between buses 2 and 3 breaks. We can see from 
this example why we need to consider the security constraint 
so as to avoid large area blackouts. 

In lf32l . the general form of SCOPF can be formulated as 


follows 



min /°(x°,u° 

x»,..,xC ; u»,..,u c 

) 

(18) 

s.t. g 0 (x°,u 0 

) = 0, 

09) 

h°(x°, u° 

)<0, 

(20) 

g c (x c ,u c ; 

1 = 0, 

(21) 

h c (x c ,u c ; 

) < 0, and 

(22) 

ju° - iTj 

< A 0 , c = 1,. 

■,C, (23) 


where f° is the objective function, which (IT8l) aims to max- 




























imize the total social welfare or equivalently minimize offer- 
based energy and production cost, x c is the vector of state 
variables, which includes voltage magnitudes and angles at all 
buses, and u c is the vector of control variables, which can 
be generator real powers or terminal voltages. The superscript 
c = 0 corresponds to the pre-contingency configuration, and 
c = 1 ,,C correspond to different post-contingency config¬ 
urations. In addition, A c is the maximum allowed adjustment 
between the normal and contingency states for contingency c. 

In the conventional SCOPF problem, the equality constraints 
| 2 T| on g c , e = 0 ,.... C, represent the system nodal power flow 
balance over the entire grid, and the inequality constraints [ 22 ] 
on h c , c = 0,..., C, represent the physical limits on the equip¬ 
ment, such as the operational limits on the branch currents and 
bounds on the generator power outputs. Constraints (fl~9l >- (l20t 
capture the economic dispatch and enforce the feasibility of 
the pre-contingency state. Constraints (|2T1>- (|22T > incorporate the 
security-constrained dispatch and enforce the feasibility of the 
post-contingency state. Constraint d23b introduces the security- 
constrained dispatch with rescheduling, which couples control 
variables of pre-contingency and post-contingency states and 
prevents unrealistic post-contingency corrective actions. Note 
that there are some variations on the objective function and 
constraints of the SCOPF problem, and we focus on the above 
conventional formulation in this subsection. 

Following the standard approach to formulating the SCOPF 
problem, the objective here is to minimize the cost of gen¬ 
eration while safeguarding the power system sustainability. 
For the sake of simplicity and computational tractability, 
constraints (fl9l >- (l22li are modeled with the linear DC load flow, 
and we assume that the list of contingencies is given. Thus, 
assuming a DC power network modeling and neglecting all 
shunt elements, the standard SCOPF problem can be simplified 
to the following optimization problem 


min 

0 O ,....0 C ;P9>°,....PS> C ' 


S.t. 


E/®( p f° 

) 


(24) 

ieS 




bL 0 ° + p 

<2,0 _ A 9,0pg,0 

= 0 , 

(25) 

BL s 0 c + P 

d,c __ ^j,cpj,c 

= 0 , 

(26) 

|B"0°|-F max <O, 


(27) 

\B c f 6 c \-F max <0, 


(28) 

pg,0 pff.O 

< P 9 ’ 0 , 


(29) 

P 9;C < p#,C 

< P 9 ’ c , 


(30) 

|p 3 ,0 _ p9,c 

| < A c , and 


(31) 

i eG, C = 

i 


(32) 


where the notation is given in Table Q] 

The solution to (l24l > ensures economical dispatch while 
guaranteing power system security, by taking into account 
a set of postulated contingencies. The major challenge of 
SCOPF is the problem size, especially for large systems 
with numerous contingency cases to be considered. Directly 
solving the SCOPF problem by simultaneously imposing all 
post-contingency constraints will result in prohibitive memory 


TABLE I 

Notation definitions. 


Q 

Set of generators 

a r 

Set of buses 

B 

Set of branches 

e c e r | a 0 

Vector of voltage angles 

ps.c e Kiel 

Vector of real power flows 

fi 

p9,0 

1 i 

Generation cost function 

Displaceable real power of each individual gen¬ 
eration unit i for the pre-contingency configura¬ 
tion 

g RMxpvi 

Power network system admittance matrix 

B c f G RlBIxM 

Branch admittance matrix 

pd,c g r |AA| 

Real power demand 

A»’ c g Rl-^lxiei 

Sparse generator connection matrix, whose 
th element is 1 if generator j is located 
at bus i and 0 otherwise 

F max 

Vector for the maximum power flow 

p g,c 

Upper bound on real power generation 

p g,c 

Lower bound on real power generation 

Ac 

Pre-defined maximum allowed variation of 
power outputs 


requirements and a substantial CPU burden. The proposed 
distributed optimization method is based on the ADMM. 
However, the optimization problem (l24l > cannot be readily 
solved using ADMM, since the constraint (OH couples the pre¬ 
contingency and post-contingency variables, and the inequal¬ 
ities make the problem even more complicated. To address 
these challenges, the optimization problem (l24l) can then be 
reformulated by introducing a slack variable p c £ RJ-' 


minimize (124|) (33) 

subject to Constraints (l25l )- (f30l) . (34) 

ps,0 _ p 5 ,c + p c = Ac) and (35 ) 

0 < p c < 2A C , c = 1,..., C. (36) 


The above optimization problem can be solved distributively 
using ADMM. The scaled augmented Lagrangian can be 
calculated as 

£ p ({p^}c=i;{p c }? =1 ;{M c }f=i) = 

c c 

E fi ( P f’°)+E y l|P 9 ’°-P 9 >c +P c -A c +p c ||i (37) 

tee o=i 

The optimization variables P 9 , 0 ,P 9 ’ C , and p c are arranged 
into two groups, (P 9,0 } and {P' / c . p r: }, and updated itera¬ 
tively. The variables in each group are optimized in parallel 
on distributed computing nodes, and coordinated by the dual 
variable vector fi' during each iteration. 

At the k th iteration, the P 9,0 -update solves the base scenario 
with squared regularization terms enforced by the coupling 
constraints and expressed as 

P 9 ’°[k + 1] = argminV /f(Pf°) 
ps '° 

+ E y IIP 9 ’ 0 - P 9 lC [k] + P c [k] -A c + S[k] HI, 

C= 1 

subject to Constraints J25J), (I27|h and (PU|) . (38) 
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Fig. 4. Computing time for the IEEE 57 bus system, IEEE 118 bus system and IEEE 300 bus system with different numbers of contingency cases. 


The P 9,c -updating solves a number of independent optimiza¬ 
tion subproblems correspond to post-contingency scenarios 
and can be calculated distributively at the c th computing nodes 
via 

P 9 ’ C [fc + l] = 

argmin ^-\\P 9 ’°[k + 1] - P 9 > c + p c - A c + fJ, c [k]\\l 

subject to Constraints (OBI) . (051) . (1501) . and (1351) . (39) 

where the scaled dual variable vector is also updated locally 
at the c th computing utility as 

H c [k + 1] = fi c [k] +P 9 ’°[k+ 1] - P 9 ' c [k + 1] +p c [k + 1] - A c . 

(40) 

At the k th iteration, the original problem is divided into 
C + 1 subproblems of approximately the same size. The 
computing node handling P 9,0 needs to communicate with all 
computing nodes solving (|39| > during the iterations. The results 
of the P 9 , 0 -update, {P 9,0 }, will be distributed among the 
computing nodes for the P 9 ,c -update. After the P 9 ,c -update, 
the computed {P 9,c , p c , fi r } will be collected to calculate the 
pre-contingency control variables. The subproblem data are 
iteratively updated such the block-coupling constraints (l35l> are 
satisfied at the end. Note that since each of the subproblems is 
a smaller-scale OPF problem, existing techniques for OPF can 
be applied with minor modifications.The proposed algorithm 
is illustrated in Algorithm [7] 

Numerical studies are examined to evaluate the performance 
of the proposed algorithm. Three classical test systems are 
used: the IEEE 57 bus, the IEEE 118 bus, and the IEEE 
300 bus. The computing time for test systems with different 
numbers of contingency cases is investigated and results are 
given in Figure [4] The number of contingencies is increased 
by 20% each time and the computing time is recorded. It can 
be seen from these figures that with an increase in the number 
of contingency cases for the SCOPF problem, the computing 
time of the centralized algorithm increases much faster than 
that of the proposed algorithm. Thus, the proposed distributed 
algorithm is more scalable and stable than the centralized 
approach. 


Algorithm 7 Distributed SCOPF. 

Input: B c bus , B c f , A 9,c , P d ' c , PP££, A c ; 

Initialize: 0 C , P 9,c , p c , /x c , p c , k = 0; 
while not converge do 
P 9,0 -update: 

P 9 ’°[^ + !] = argmin ps ,o £ ieS //( p f°) 

+ Ef= i £||P 9 ’°-P 9 - C [fc] + P c [fc] - A c + ^[k)\\l 

subject to Constraints (I25l).(l27l). and ( [29b . 

ps.c-update, distributively at each computing node: 

P 9 ’ c [k + 1] = arg min P! ,,c p c ^||P 9 '°[fc + 1] - P 9 ' c + 
p c - A c + n c [k]\\l 

subject to Constraints (I26l>.(l28l>.(l30l>. and (|36b . 

p c [fc + 1] = p c [fc] + P 9 ’°[k + 1] - P 9 ’ c [fc + 1] + p c [k + 
1] - A c . 

Adjust penalty parameter p c is necessary; 

k = k + 1; 

end while 
return 0 C , P 9,c ; 

Output 0 C , P 9,c ; 


B. Mobile Data Offloading in SDN 

We consider a mobile network which consists of B cellular 
base stations (BSs) and A access points (APs). A BS b G 
{1,... ,B} serves a group of mobile users and has the demand 
to offload its traffic to APs. An AP a G {1,..., A} is a WiFi 
or femtocell AP which operates in a different frequency band 
and supply its bandwidth for data offloading. The maximum 
available capacity for data offloading of each AP a is denoted 
by C a - The SDN controller manages the BSs and APs through 
the access network discovery and selection function (ANDSF), 
and makes the mobile data offloading decisions according to 
various trigger criteria. Such criteria can be the number of 
mobile users per BS, available bandwidth/IP address of each 
BS, or aggregate number of flows on a specific port at a BS. 

Let Xfc = [xbi, • • •, XbA] T represent the offloaded traffic of 
BS b, where Xba denotes the data of BS b offloaded through AP 
a. Correspondingly, y a = [y al ,... ,t/ a s] T represents the ad¬ 
mitted traffic of AP a, where y a b represents the admitted data 




















traffic from BS b. Generally, a feasible mobile data offloading 
decision exists when BSs and APs reach an agreement on the 
amount of offloading data, i.e., x ba = yab,Va and V 6 . We 
assume that the mobile data of BSs can be offloaded to all of 
the APs without loss of generality. Moreover, we assume that 
the time is slotted and during each slot duration the offloading 
demand from BSs is fixed. The SDN controller needs to find 
a feasible offloading schedule at the beginning of each time 
slot, while maximizing the utility of BSs at a reasonable cost 
of APs. 

We denote BS V s utility of offloading its traffic to APs 
by U b (x. b ), where [/&(•) is designed to be a non-decreasing, 
non-negative and concave function in x/,, Vb. For example, 
the function can be logarithmic, and the concavity is justified 
because of diminishing returns of the resources allocated to 
the offload data. Likewise, we use function L 0 (y 0 ) to describe 
the AP a’s cost of helping BSs offload data, where L a {-) is a 
non-decreasing, non-negative and convex function in y a ,Va. 
The cost function can be a linear cost function, which means 
the total cost of APs will increase as the amount of admitted 
mobile data increases. 

For the SDN controller, the total revenue for mobile data 
offloading is expressed as Yl b =i U b {x b ) - J2a=i L a{y a )- 
To maximize the total revenue, the equivalent minimization 
optimization problem can be formulated as, 

A B 

min J2 L a(y a ) (41) 

{x 1 ,...,x B },{y 1 ,...,y A } a=1 b=1 

B 

s.t ^ ~2y a b<C a , Va, (42) 

6=1 

Xba = Vab, Va, b, (43) 

where (l42l > stands for the capacity constraint at each AP, and 
(l43l > represents the consensus of BSs and APs on the amount 
of mobile data. 

We propose a fully distributed algorithm to solve the 
optimization problem ( TiTl ). The computing paradigm of the 
proposed algorithm is shown in Figure [5] and can be described 
as follows. During each iteration, the BSs and APs update x 
and y concurrently. The updated x and y are gathered by 
the SDN controller, which performs a simple update on A 
and scatters the dual variables back to the BSs and APs. The 
iteration goes on until a consensus on the offloading demand 
and supply is reached. Specifically, we fist calculate the partial 
Lagrangian of (l4ll >. which introduces the Lagrange multipliers 
only for constraint d43l >. 

A B 

£p(x, y, A) = L a (y a ) - U b (x b ) 

< 2=1 6=1 

A B A B 

-EE ^ab(%ba Dab) H - ^ EE \\x ba - VabWl, (44) 

a— 1 6=1 <2=1 6=1 

where A G R '"' 11 is the Lagrange multiplier and p is the penalty 
parameter. The updates of BSs and APs can be performed 
concurrently according to the proximal Jacobian multi-block 



Base Stations Access Points 

(T) Gather: BSs and APs concurrently update x and y, which are gathered by controller. 

@ Scatter: Controller simply updates A, which are scattered to BSs and APs 

Fig. 5. Distributed computing paradigm of proposed algorithm. 

ADMM. We describe the update procedure of the BSs, APs 
and SDN controller as follows. 

Base Station Update: At each BS b, the update rule can 
be expressed as, 

x 6 + 1 = ar gmin(-[/h(x 6 ) + ^^ \\xba-p k a b\\l + \\\*b-*i\\i z ), 

Xb 1 a=l z 

(45) 

where Pj = 0.11 and I is the identity matrix, and Pab = 
(y^ b + -jp)• Va is the ‘signal’ sent from the SDN controller to 
BS b. The update ( 1451 is a small scale unconstrained convex 
optimization problem. For each round of the update, it sends 
Xf, of size A to the SDN controller. Note that the update of 
each BS b is performed independently and can be calculated 
locally. Once x b is updated, it is sent to the SDN controller 
while the utility function U b (-) is kept confidential. 

Access Point Update: The update rule at each AP a can 
be expressed as, 

B 

y a +1 = argmin(L 0 (y 0 ) + ^ ^ \\y ab - q ba \\l 
y » Z 6=1 

1 B 
+ 2 ^ 0 -yollpj. s - 1 - J2yab<C a , (46) 

6=1 

where P 8 = 0.11 and q^ a = (x£ a - ^),Vb. q ba is the 
‘signal’ from the SDN controller to AP a. The update (l46l > is a 
small scale convex optimization problem with linear inequality 
constraints. For each round of the update, it sends y a of size 
B to the SDN controller. The update of y is also performed 
independently at each AP. During the update, the information 
of cost function L a (-) is kept private. y a is sent to the SDN 
controller once updated. 

SDN Controller Update: At the SDN controller, the update 
rule can be expressed as, 

= *ab - 7pEI>L +1 - (47) 

6=1 a=l 










Fig. 6. Convergence performance of the proposed algorithm by objective 
value when (B = 5, A = 5) and (B = 5, A = 10). 


After gathering x and y from the BSs and APs, the SDN 
controller performs a simple update on the dual variable A by a 
simple algebra operation. After that, the ‘signal’ variables pt> a 
and qta are scattered back to the corresponding BSs and APs, 
respectively. For each round of the update, it sends pb a , Va to 
each BS b, which is of size A, and sends gb a) V 6 to each AP 
a, which is of size B. 

Remark that in the Jacobian type update, the iterations 
of the BSs and APs are performed concurrently instead of 
consecutively in the Gauss-Seidel type update. There is no 
direct communication between the BSs and APs, which kept 
the intermediated update results of x and y confidential to 
each other. The updates at iteration k + 1 only depends on its 
previous value at iteration k, which enables a fully distributed 
implementation. 

At each iteration, the update operations at BSs and APs 
are quite simple. The update at each BS b and AP a are 
simple small scale convex optimization problems, which can 
be quickly solved by many off-the-shelf tools like CVX 1351 . 
As for the communication overhead, for each iteration the 
signaling between each BS and SDN controller is of the size 
2A (size of x& and pf, a >Va). Likewise, the signaling between 
each AP and SDN controller is of the size 2 B (size of y a 
and gf, a ,V&). The sizes of those signaling messages are quite 
small compare with the offloading message body and can be 
communicated in the dedicated control channel. The proposed 
distributed algorithm is described in Algorithm [ 8 ] 

We consider a wireless access network consists of B = 5 
base stations and A = {5,10} access points coordinated by 
the SDN controller. The SDN controller will offload mobile 
data traffic of BSs to APs, and the available capacity of each 
AP for offloading is C a = 10 Mbps. The utility function of BS 
b is C/b(xb) = log(x ; } 1 + 1), where 1 is the all one vector. The 
cost function of AP a is a linear cost expressed as L a (y a ) = 
6 a *yj 1, where 9 a > 0 is the cost coefficient. The value of 9 a 
is application specific. During numerical tests, we assume 9 a is 


Algorithm 8 Distributed Mobile Data Offloading 

Initialize: x°,y° A 0 , p > 0, 7 > 0; 
for ft = 0, 1 ,... do 

{Update Xb and y Q for b = 1,.. ., B and a = 1,..., A, 

concurrently.} 

{Base station update, V6} 


x. 


fc+i 


= argmin Xb -f/ 6 (x 6 )+f £ a=1 \\x ba -y k ab - -f |||+ 


^||xb —x^||p.; 

{Access point update, Va} 

yfc+t = argmin yb L a ( yj + f £f=i \\x^ a -y ab ~- 

5l|y a -ySlIp,; 

{SDN controller update} 

\fc+l _ \k f* 4 ( r k+1 - 7/ fc+1 V 

A ab — A ab IP 2^6=1 2_,a=l \ X ba Dab )’ 

end for 

Output x, y; 


l 2 + 

I 2 + 


a Gaussian random variable which has a distribution Af(0, 1) 
for simplicity. We perform numerical tests on the offloading 
decision for one time slot. The simulation result is shown in 
Figure [ 6 ] It shows that the proposed algorithm converges to 
the optimal objective in a moderate number of iterations when 
B = 5 and A = 5. It takes a longer time for the proposed 
algorithm to converge when A = 10. It indicates that when 
these are more APs in the access network, it will take a longer 
time for the SDN controller to coordinate BSs and APs for a 
consensus on the offloading demand and supply. 

C. Other Extensions of ADMM 

Decentralized state estimation in smart grid: Previous 
work on SCOPF presented in Section. H V- Al used direct current 
(DC) power flow approximation for system state estimation 
and optimal power flow dispatch. The DC approximation 
model can provide quick operation instructions for the system. 
For precise system status monitoring and operation, alternating 
current (AC) power flow equations are needed 

N 

p% = I^IIVfcKGifcGOS^ifc + BikSmOik) (48) 

k =1 
N 

Qi = I Vi 11 Vk I ( Gik sin ° ik ~~ Bik cos ° ik ) ’ ( 49 ) 

k =1 

where Pi and Qi are real power flow and inactive power flow 
at bus i, respectively. V) is the voltage magnitude at bus i. 
Gik and ll,r are the real and imaginary part of the (*, k) th 
element of the bus admittance matrix. 9jk is the voltage phase 
angle difference between bus i and bus j. The problem of state 
estimation is how to find voltage magnitudes and phase angles 
given nonlinear equations of real and inactive power flows in 
the system. 

Smart meter reading data clustering: The advanced me¬ 
tering infrastructure (AMI) enables two-way communications 
with the meter. The smart meters are able to record the 
consumption of electric energy of each household and send 
readings to data centers of utility companies for billing and 
















customer service. This provides real time information about 
electric energy consumption and behaviors of consumers, 
which can be used for data mining. The smart meters record 
electric energy consumption of consumers every fifteen min¬ 
utes, which means that a substantial amount of data are 
generated daily in the U.S. By investigating those data, we can 
better understand profiles of consumers to ensure the quality of 
service, develop targeted electric energy plans, and accurately 
predict energy consumption of the power system. 

Efficient air quality monitoring: The air pollution has been 
an utmost concern for public health nowadays. In 2012, around 
seven million people dead worldwide due to the air pollution. 
However, the existing air-quality monitoring network has very 
low spatial and temporal coverage, which severely limits its 
ability to predict air quality and to analyze its impact on 
environment, climate, and public health. Fortunately, there 
exists a large amount of diverse data, such as satellite remote 
sensing data, meteorological data (temperature, wind, pressure, 
humidity, etc.), and traffic data (volume, speed, congestion) 
which can be utilized. Instead of solely relying on the tradi¬ 
tional monitoring network to provide us the air quality data, 
many heterogeneous big data sources can be used to develop 
innovative big data processing methods in air quality research. 

V. Conclusion 

In this paper, we have reviewed several distributed and 
parallel optimization method based on the ADMM for large 
scale optimization problems. We have introduced the back¬ 
ground of ADMM and described several direct extensions and 
sophisticated modifications of ADMM from 2-block to N- 
block settings. We have explained the iterative schemes and 
convergence properties for each extension/modification. We 
have illustrated the implantations on large-scale computing 
facilities, and enumerated several applications of A'-block 
ADMM in modern communication networks. 
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